Categorization is domain-general in the sense that perceptual categories of various sorts are created from experience independently of language.
Computer simulations of language change notes
This website collects my personal notes on Computer simulations of language change. These notes are provided to bring full transparency to my research process. Of course, since they are only notes, they do not reflect my final thoughts on a topic, and should not be interpreted as such. To read finished papers, please consult my website. Do not use these notes as a basis for your own scientific research. Start from high-quality, peer-reviewed scientific literature instead.
As mentioned above, a consequence of viewing language as a complex adaptive system and linguistic structure as emergent (Lindblom et al. 1984, Hopper 1987) is that it focuses our attention not so much on linguistic structure itself, as on the processes that create it (Verhagen 2002). By searching for domain-general processes, we not only narrow the search for processes specific to language, but we also situate language within the larger context of human behaviour. (Bybee 2010, 7)
cognitive processes in this book
(p. 7)
1. categorisation
Categorization is domain-general in the sense that perceptual categories of various sorts are created from experience independently of language.
2. chunking
It is the interaction of chunking with categorization that gives conventional sequences varying degrees of analysability and compositionality.
3. rich memory
(p. 8)
4. analogy
construction (p. 9)
syntax and semantics (p. 9)
The levels of abstraction found in a usage-based grammar are built up through categorization of similar instances of use into more abstract representations (Langacker 1987, 2000) (p. 9)
evolution of constructions (p. 9)
Change is postulated to occur as language is used rather than in the acquisition process (Chapters 6, 7 and 8). (p. 9)
In usage-based theory, where grammar is directly based on linguistic experience, there are no types of data that are excluded from consideration because they are considered to represent performance rather than competence. Evidence from child language, psycholinguistic experiments, speakers’ intuitions, distribution in corpora and language change are all considered viable sources of evidence about cognitive representations, provided we understand the different factors operating in each of the settings that give rise to the data. (p. 10)
… it should come as no surprise that much of the argumentation is based on examples that demonstrate tendencies in language change (p. 10)
structural / generative traditions (p. 15)
Langacker 1987 argues that a necessary prerequisite to forming a generalization is the accumulation in memory of a set of examples upon which to base the generalization. Once the category is formed or the generalization is made, the speaker does not necessarily have to throw away the examples upon which the generalization is based. (p. 15)
A robust finding that has emerged recently in quantitative studies of phonetic reduction is that highfrequency words undergo more change or change at a faster rate than lowfrequency words. Highfrequency words have a greater proportion of consonant deletion in the case of American t/ddeletion (Gregory et al. 1999, Bybee 2000b) as well as in Spanish intervocalic [ð]-deletion (Bybee 2001a). Unstressed vowels are more reduced in highfrequency words, as shown in Fidelholtz 1975 for English and Van Bergem 1995 for Dutch, and are more likely to delete (Hooper 1976). (p. 20)
Words that are used more often are exposed to the bias more often and thus undergo change at a faster rate. The leniting bias is a result of practice: as sequences of units are repeated, the articulatory gestures used tend to reduce and overlap. (p. 20)
Exemplar models provide a natural way to model this frequency effect (an early proposal is found in Moonwomon 1992). If the phonetic change takes place in minute increments each time a word is used and if the effect of usage is cycled back into the stored representation of the word, then words that are used more will accumulate more change than words that are used less. Such a proposal depends upon words having a memory representation that is a phonetic range, that is, a cluster of exemplars (Bybee 2000b, 2001, Pierrehumbert 2001), rather than an abstract phonemic representation. (p. 20)
Exemplar models allow a natural expression of several effects of high token frequency: because exemplars are strengthened as each new token of use is mapped onto them, high-frequency exemplars will be stronger than low-frequency ones, and high-frequency clusters – words, phrases, constructions – will be stronger than lower frequency ones. (p. 24)
↓ effects (p. 24)
↳ morphological stability
Assuming that regularization occurs when an irregular form is not accessed and instead the regular process is used, it is less likely that high-frequency inflected forms would be subject to regularization. (p. 25)
The more frequent of the members of a paradigm tends to serve as the basis of new analogical formations; thus the singular of nouns is the basis for the formation of a new plural (cow, cows) rather than the plural serving as the basis for a new singular (kine [the old plural of cow] does not yield a new singular *ky). Similarly, the present form serves as the basis for a regularized past and not vice versa. (p. 25)
Second, consider the way new constructions arise. New constructions are specific exemplars of more general existing constructions that take on new pragmatic implications, meanings, or forms due to their use in particular contexts. (p. 28)
p. 31-32
rich memory representation
↓
phonetics
morphology
speaker’s experience
↓
phonetic variation
morphologically complex words
syntax
Other implications of exemplar representation for constructions are discussed in subsequent chapters.
chunk (p. 34)
↳ repetition (p. 34)
Note that repetition is necessary, but extremely high frequency in experience is not. Chunking has been shown to be subject to the Power Law of Practice (Anderson 1982), which stipulates that performance improves with practice but the amount of improvement decreases as a function of increasing practice or frequency. Thus once chunking occurs after several repetitions, further benefits or effects of repetition accrue much more slowly. (p. 34)
In general experience as well as in language, it is usually the case that the larger the chunk, the less often it will occur. (p. 35)
While language users constantly acquire more and larger chunks of language, it is not the case that in general the language acquisition process proceeds by moving from the lowest level chunks to the highest. Even if children start with single words, words themselves are composed of smaller chunks (either morphemes or phonetic sequences), which only later may be analysed by the young language user. In addition, however, children can acquire larger multiword chunks without knowing their internal composition (Peters 1983) (p. 35)
All sorts of conventionalized multiword expressions, from prefabricated expressions to idioms to constructions, can be considered chunks for the purposes of processing and analysis. (p. 35)
status of a chunk in memory (p. 36)
↓
no chunk
↕
weak chunk
↕
frequent chunk
On the high-frequency end of the continuum, chunks such as grammaticalizing phrases or discourse markers do lose their internal structure and the identifiability of their constituent parts; see section 3.4.2 for discussion. (p. 36)
↳ source of constructions
phonetic reduction (p. 37)
In addition, we must note that words that are used more often in a context favourable to reduction will also undergo more reduction. (p. 37)
factors (p. 38)
differential reduction within chunks (p. 43)
(p. 45)
| compositionality | analysability |
|---|---|
| a semantic measure | |
| the degree of predictability of the meaning of the whole from the meaning of the component parts (Langacker 1987) | ‘recognition of the contribution that each component makes to the composite conceptualization’ |
| gradient | |
Derived words can be compositional or not: compare hopeful, careful and watchful, which have fairly predictable meanings based on the meanings of the noun base and suffix, to awful and wonderful, which are less compositional since awful indicates a negative evaluation not present in the noun awe and wonderful indicates a positive evaluation not necessarily present in wonder.
(p. 45)
As we noted in Chapter 2, an idiom such as pull strings is not fully compositional in that it has a metaphorical meaning, but it is analysable, in the sense that an English speaker recognizes the component words, as well as their meanings and relations to one another and perhaps activates all this in the interpretation of the idiom. Similarly, compounds such as air conditioning or pipe cleaner are analysable in that we recognize the component words; however, as is well known, the interpretation of compounds is highly contextdependent and thus they are not usually fully compositional (Downing 1977).
(p. 45)
Hay demonstrates through several experiments that the derived words that are more frequent than their bases are less compositional or less semantically transparent than complex words that are less frequent than their bases.
(p. 46)
Thus entice is more frequent than enticement; eternal is more frequent than eternally; top is more frequent than topless. However, there are also cases where the reverse is true: diagonally is more frequent than diagonal; abasement is more frequent than abase and frequently is more frequent than frequent. (p. 46)
Hay shows that simple token frequency does not correlate with the results of either experiment, as the claim in Bybee 1985 would predict. In Bybee 1985 I proposed that loss of analysability and semantic transparency were the result of the token frequency of the derived word. Hay has improved on this claim by showing the relevant factor to be relative frequency, at least at the frequency levels she studied. My suspicion is that at extremely high token frequencies, loss of analysability and transparency will occur independently of relative frequency. (p. 46, emphasis mine)
Processing morphologically complex word (p. 47)
full transparency
single unit + transparency
fully opaque
analysability loss (p. 48)
contributions to autonomy
grammaticalization (p. 50-51)
constructions (p. 76)
While everyone who works on constructions agrees that they cover everything from monomorphemic words, to complex words, to idioms, all the way up to very general configurations such as ‘the passive construction’ (because they are all form–meaning pairings), the term is usually applied to a morphosyntactically complex structure that is partially schematic. (p. 76)
(p. 76-77)
we can note that it is not just the idiomatic portions of language that show a strong interaction of specific lexical items with grammatical structures. Even what must be regarded as fairly general syntactic structures, such as clausal complements, depend heavily upon the specific verb of the main clause. Thus think takes an ordinary finite clause (I think it’s going to snow) while want takes an infinitive clause (I want it to snow) and see takes a gerundial complement (I saw him walking along). The argument for constructions is that the interaction of syntax and lexicon is much wider and deeper than the association of certain verbs with certain complements. (p. 77)
construction-based grammar? (p. 77)
constructions and valence (p. 78)
Why constructions? (p. 78)
most important property of constructions (p. 78-79)
↳ lexical items (p. 79)
prototype effects (p. 79)
For one thing, the fact that exemplars contain full detail of the percept (whether it be a bird or an utterance) allows for categorization by a number of features, not just those that are contrastive. For instance, a more prototypical bird is small – the size of a sparrow or a robin – while large birds are less prototypical, even though size is not a distinguishing feature of birds. (p. 79)
Given that constructions are conventional linguistic objects and not natural objects that inherently share characteristics, it seems that frequency of occurrence might significantly influence categorization in language. Considering also that using language is a matter of accessing stored representations, those that are stronger (the more frequent ones) are accessed more easily and can thus more easily be used as the basis of categorization of novel items. Because of this factor, a highfrequency exemplar classified as a member of a category is likely to be interpreted as a central member of the category, or at least its greater accessibility means that categorization can take place with reference to it. (p. 89-90)
Bybee says that she will give evidence. I guess this will be like lexical recognition tasks
Incoming exemplars are placed in semantic space closer to or farther from strong exemplars depending upon their degree of similarity. Categorization is probabilistic along the two dimensions. On some occasions categorization can be driven by similarity to a member of lesser frequency if there is greater similarity to this less frequent member (Frisch et al. 2001). However, the probabilistic interaction of frequency and similarity will result in a category whose central member is the most frequent member. (p. 80)
schematicity (p. 80)
slot-fillers (p. 80)
highly schematic categories (p. 81)
restrictive nature of slots (p. 81)
| slot-filler | frequency |
|---|---|
| crazy | 25 |
| nuts | 7 |
| mad | 4 |
| up the wall | 2 |
| out of my mind | 1 |
| over the edge | 1 |
| Salieri-mad | 1 |
(p. 81)
Why would crazy be the adjective that leads the march in this case? It is the most frequent adjective in this semantic domain. It is less serious than mad in its ‘insane’ meaning, because for American speakers crazy does not necessarily indicate a clinical condition and so it is more appropriate to the hyperbolic use.
(p. 82-83)
abstract analyses (p. 84)
↕
exemplar categorisation (p. 84)
quedarse-survey (p. 85)
Four categories
central: immóvil
1. synonyms / near-synonyms
2. metaphors that result in similar meanings
3. hyperbolic expressions
4. shared features
5. socially-informed inferential associations
clusters (p. 87)
diachronic development (p. 90)
So far we have examined highly focused categories that are organized around a central member and show high degrees of similarity among the members. These would be less schematic categories, due to their narrow range. But other relationships among the items that occur in a position in a construction are possible as well. Exemplar learning allows categories of various sorts. Some categories are much more schematic and do not have a central high-frequency member. Others do have a high-frequency member but do not show expansion on the basis of that member. (p. 91)
productivity (p. 94)
degree of productivity for a slot (p. 94)
Increasing autonomy, which creates a new construction, has been discussed in Bybee 2003b and 2006a. Of relevance for the current discussion is the fact that when a particular instance of a construction – that is, a construction with a particular lexical item – becomes highly frequent, it is processed as a unit. As we saw in Chapter 3, the more often a sequence is processed directly as a unit, the less likely it is to activate other units or the construction to which it belongs and the more likely it is to lose its analysability. At the same time, use in particular contexts contributes to shifts in meaning, which decrease compositionality and make the former exemplar of a construction move away from its source. For example, the be going to construction arose from a purpose clause construction in which any verb could occupy the position go now occupies. Because of the semantic generality of go, it happened to be the most frequent movement verb in the purpose construction. Because of its use in context, one could infer a sense of intention to do something from it, and this became part of its meaning. As a result of its frequent access as a unit and the semantic change due to inferences in context, subject be going to verb has become a new construction independent of the purpose construction from which it arose. (p. 96)
collostuctional analysis (p. 97)
how? (p. 97)
The researchers developing this method feel that it is important to take into account the overall token frequency of a lexeme in determining how expected it is in a construction, as well as the lexeme’s frequency in the construction. Thus a lexeme with an overall high token count will be judged as less attracted to a construction than one with a low frequency, all other things being equal. In addition, the calculation takes into account the lexeme’s frequency in the construction relative to other lexemes that appear in the construction. The final and fourth factor is the frequency of all the constructions in the corpus.
(p. 97)
In the calculation, high overall token frequency of a lexeme detracts from its Collostructional Strength. The stated reasoning is to control for general frequency effects: in order for a lexeme to have high Collostructional Strength it must occur in the construction more often than would be predicted by pure chance (Gries et al. 2005: 646). (p. 97)
(p. 97)
(about solo ‘alone’)
Its highly general meaning makes it frequent in the corpora and it is also this general meaning that makes it a central member of the category of adjectives occurring in this construction. So in this case, Collostructional Analysis may give the wrong results, because a high overall frequency will give the word solo a lower degree of attraction to the construction according to this formula. (p. 98, emphasis mine)
The corpusbased analysis of Bybee and Eddington takes the most frequent adjectives occurring with each of four ‘become’ verbs as the centres of categories, with semantically related adjectives surrounding these central adjectives depending on their semantic similarity, as discussed above. Thus our analysis uses both frequency and semantics. Proponents of Collostructional Analysis hope to arrive at a semantic analysis, but do not include any semantic factors in their method. Since no semantic considerations go into the analysis, it seems plausible that no semantic analysis can emerge from it. (p. 98)
aaaaaaaaaaaaaaaaaaaaa Firth draait zich om in zijn graf
(p. 100-101)
First, observe that the adjectives that occurred in the constructions with the highest frequency have the highest Collostructional Strength and also have high ratings for acceptability. For these cases, Collostructional Strength and mere frequency make the same predictions.
For the low-frequency adjectives, however, the experiment revealed, as Bybee and Eddington had predicted, a difference between lowfrequency adjectives that were semantically similar to the high-frequency those that were not. This turned out to be quite significant in the experiment with the low-frequency, semantically related adjectives garnering judgements almost as high as the highfrequency adjectives. In contrast, Collostructional Analysis treats all of the adjectives that occurred with low frequency in the construction the same, giving them very low scores. Of course, the Collostructional Analysis cannot make the distinction between semantically related and unrelated since it works only with numbers and not with meaning. Thus, for determining what lexemes are the best fit or the most central to a construction, a simple frequency analysis with semantic similarity produces the best results.
(p. 100)
reasonable interpretation of the results of the Bybee and Eddington corpus study and experiment is that lexemes with relatively high frequency in a construction are central to defining the meaning of the construction (Goldberg 2006) and serve as a reference point for novel uses of the construction. If this interpretation is correct, then the frequency of the lexeme in other uses is not important.
Gries and colleagues argue for their statistical method but do not propose a cognitive mechanism that corresponds to their analysis. By what cognitive mechanism does a language user devalue a lexeme in a construction if it is of high frequency generally? This is the question Collostructional Analysis must address. (p. 100)
grammaticalisation (p. 106)
grammaticalisation of lexical items (p. 106)
Thus going to does not grammaticalize in the construction exemplified by I’m going to the gym but only in the construction in which a verb follows to, as in I’m going to help you.
(p. 106)
origin of grammaticalization (p. 107)
effects (p. 107)
1. chunking (p. 108)
2. loss of analysability
3. loss of specific meaning / “bleaching”
Through grammaticalization we see how the grammar of a language can arise just as structure arises in a complex adaptive system. The mechanisms operating in real time as speakers and listeners use language, repeated over and over again in multiple speech events, lead to gradual change by which grammatical morphemes and their associated constructions emerge. The lexical material which consists of both form and meaning is molded into constructions which are conventionalized, repeated and undergo further change in both form and meaning. (p. 110)
increases in frequency trigger their operation, while at the same time the output of these processes (semantically more generalized meanings or a wider applicability due to inferences) leads to further frequency increases (Bybee 2009a). (p. 113)
The problem here is of course that the assumption that language can only change during acquisition is incorrect. It is worth noting that this claim is frequently made by researchers whose empirical research does not actually address this question (Janda, Newmeyer). In the next section we address the issue of child-based language change directly. For now note that it is the generativist view of grammar as discrete and unchanging in the adult, that makes this assumption necessary and which thus denies the striking unidirectionality of grammaticalization change. In contrast, if usage is the basis of grammar and change in the grammar, then there is no a priori reason why change cannot occur over an adult’s lifetime. Given that the mechanisms that propel the changes encompassed by grammaticalization are operative in all generations, there is no reason to doubt that change can be unidirectional. (p. 114)
Given that in structural and generative theories grammatical structures are discrete and independent of meaning and use, change must be regarded as an anomaly. The source of change cannot reside in usage or the grammar itself, and thus it has been proposed in these theories that change in the grammar can only come about during its transmission across generations. While many writers assume that the child language acquisition process changes language (Halle 1962, Kiparsky 1968, Lightfoot 1979 and many others both earlier and later; see Janda 2001 for more references), empirical evidence that this is actually the case is still lacking (Croft 2000). (p. 115)
However, Slobin notes that children start with the concrete notions and those most anchored in the present because these notions are cognitively the most simple, natural and accessible. Similarly, in diachrony, the most concrete notions often constitute the starting points for grammaticalization because the material the process works on comes from the basic lexicon – concrete nouns such as body parts and highly generalized verbs such as be, have and go. Thus the parallel here between ontogeny and phylogeny is the correspondence between two processes that may be only superficially similar. (p. 116)
Computer simulations of language change notes
This website collects my personal notes on Computer simulations of language change. These notes are provided to bring full transparency to my research process. Of course, since they are only notes, they do not reflect my final thoughts on a topic, and should not be interpreted as such. To read finished papers, please consult my website. Do not use these notes as a basis for your own scientific research. Start from high-quality, peer-reviewed scientific literature instead.